In [1]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns 
%matplotlib inline

In [2]:
train = pd.read_csv(r'C:\Users\hrao\Documents\Personal\HK\Python\train.csv')
test = pd.read_csv(r'C:\Users\hrao\Documents\Personal\HK\Python\test.csv')

The plot shows that the number of female survivors were significantly more than the male survivors. There were more survivors overall in first class than in any other class.

There were also less survivors overall in third class than in any other class.

Male survivors were twice in first class than in second or third class. Female survivors in first class were twice that of third class.


In [12]:
sns.barplot(x='Pclass',y='Survived',data=train, hue='Sex')


Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0xbf06588>

The plot explains the above facts in a different representation.


In [13]:
sns.barplot(x='Sex',y='Survived',data=train, hue='Pclass')


Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0xcf01cc0>

The plot explains the distribution of survivors across age and class. More red on the lower part of the left swarm indicates that younger passengers in the third class had the least chance to survive.

More blue spots on the top part of the right swarm meant that elderly passengers from the first class had the best chance to survive.

Distribution of blue spots on the right swarm is uniform - indicating that, irrespective of age, the first class had better chances of survival.


In [30]:
sns.swarmplot(x='Survived',y='Age',hue='Pclass',data=train)


Out[30]:
<matplotlib.axes._subplots.AxesSubplot at 0xf74ac50>

The plot shows that male passengers had the least chance of survival and female passengers had the best chance of survival.


In [33]:
sns.swarmplot(x='Survived',y='Age',hue='Sex',data=train)


Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0xf3d3a20>

Same data with a different representation.


In [38]:
sns.swarmplot(x='Sex',y='Age',data=train)


Out[38]:
<matplotlib.axes._subplots.AxesSubplot at 0x10d2f128>

Plot showing distribution of fares among classes of travel. A first class ticket is about 4 times a second class ticket.

A third class ticket costs about 3/4 a second class ticket.


In [36]:
sns.pointplot(x='Pclass',y='Fare',data=train)


Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0xfa42da0>

The plot shows differences in fares based on the point of embarkation.

Fares from Cherbourg were the highest, in fact costing about twice as the fares from Southampton and about three times as the fares from Queenstown.

Fares from Southampton costed twice that of Queenstown.

C = Cherbourg, Q = Queenstown, S = Southampton 

In [48]:
sns.barplot(x='Embarked',y='Fare',data=train)


Out[48]:
<matplotlib.axes._subplots.AxesSubplot at 0x11005128>